Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Feb 20, 2021

What changes were proposed in this pull request?

This PR proposes to rename map to mapAttr in dsl/package.scala .
This PR also removes APIs which take Symbol type parameters.

Why are the changes needed?

With the Catalyst DSL (dsl/package.scala), we have two ways to represent attributes.

  1. Symbol literals (' syntax)
  2. $"" syntax which is defined in sql/catalyst module using string context.

But they have problems.

Regarding symbol literals, the scala community deprecates the symbol literals in Scala 2.13. We could alternatively use Symbol constructor but what is worse, Scala will completely remove Symbol in the future (https://scalacenter.github.io/scala-3-migration-guide/docs/incompatibilities/dropped-features.html).

Although scala.Symbol is useful for migration, beware that it is deprecated and that it will be removed from the scala-library. You are recommended, as a second step, to replace them with plain string literals "xwy" or a dedicated class.

Regarding $"" syntax, this has two problems.
The first problem is that the syntax conflicts with another $"" syntax defined in sql/core module.
You can easily see the problem with the Spark Shell.

import org.apache.spark.sql.catalyst.dsl.expressions._
val attr1 = $"attr1"

       error: type mismatch;
        found   : StringContext
        required: ?{def $: ?}
       Note that implicit conversions are not applicable because they are ambiguous:
        both method StringToColumn in class SQLImplicits of type (sc: StringContext): spark.implicits.StringToColumn
        and method StringToAttributeConversionHelper in trait ExpressionConversions of type (sc: StringContext): org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper
        are possible conversion functions from StringContext to ?{def $: ?}

The second problem is that we can't write like $"attr".map(StringType, StringType), though we can write 'attr.map(StringType, StringType).
This seems to be a bug of the Scala compiler and will be fixed in neither 2.12 nor 2.13 (scala/scala#7396).

Actually, I'm working on replacing all the symbol literals with $"" syntax in SPARK-34443 (#31569) and I found this problem in the following test code.

  • EncoderResolutionSuite.scala
  • ComplexTypeSuite.scala
  • ObjectExpressionsSuite.scala
  • NestedColumnAliasingSuite.scala
  • ReplaceNullWithFalseInPredicateSuite.scala
  • SimplifyCastsSuite.scala
  • SimplifyConditionalSuite.scala
[error] /home/kou/work/oss/spark-scala-2.13/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala:212:28: too many arguments (found 2, expected 1) for method map: (f: org.apache.spark.sql.catalyst.expressions.Expression => A): Seq[A]
[error]       $"a".map(StringType, StringType)).foreach { attr =>

As a solution for those problems, I propose to rename map to mapAttr.
With this solution, we can write "name".attr.mapAttr rather than $"name".map or 'name.map.

Does this PR introduce any user-facing change?

No. dsl/package.scala is for internal use.

How was this patch tested?

Modified the tests.

@github-actions github-actions bot added the SQL label Feb 20, 2021
@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39895/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39895/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Test build #135315 has finished for PR 31601 at commit 5b9c205.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Feb 22, 2021

cc: @cloud-fan


// int type can be up cast to long type
val attrs1 = Seq('a.string, 'b.int)
val attrs1 = Seq(attr("a").string, attr("b").int)
Copy link
Contributor

@cloud-fan cloud-fan Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have "name".attr (See the implicit DslString), can we extend it to support "name".attr.string or even shorter "name".string?

Copy link
Member Author

@sarutak sarutak Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not enough. "name".attr returns UnresolvedAttribute as well as $"name" so we still fail to do "name".attr.map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general idea is, previous the DSL is based on symbol, can we update the DSL (those implicts) so that it's based on string? Ideally we should simply replace 'abc to "abc" in the test code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my first idea but I thought it might be confusable that "abc" with implicit conversion doesn't look like an attribute. But DSL is for internal so it might be acceptable. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should simply replace 'abc to "abc" in the test code.

Yes ideally it should be like what you suggested but I found it's difficult.
DslString implements ImplicitOperators but DslSymbol implements ImplicitAttribute. The implementation of expr conflicts. DslString implements expr as Literal but DslSymbol implements it as UnresolvedExpression.

So, the next option would be introduce a new syntax like attr"" with StringContext.

I said in the description like as follows.

Another solution would be to introduce a new string interpolation syntax but it's difficult to have workaround when the syntax is conflict with another string interpolation syntax like that $"" defined in catalyst and core conflict so I don't adopt this solution.

But if this is internal usage only, it might be acceptable.

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Test build #135348 has finished for PR 31601 at commit c87915d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39929/

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39929/

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39931/

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39931/

@sarutak
Copy link
Member Author

sarutak commented Feb 22, 2021

cc: @dongjoon-hyun and @HyukjinKwon too for other people's opinion.

@SparkQA
Copy link

SparkQA commented Feb 22, 2021

Test build #135351 has finished for PR 31601 at commit 9f78964.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I agree that the $"abc" is not great as it conflicts with the well-known syntax defined in sql/core. But I do like the existing "abc".attr syntax instead of the new one. Can we rename map to mapAttr to avoid the scala bug?

@sarutak
Copy link
Member Author

sarutak commented Feb 24, 2021

Thanks for the advice. Renamed map to mapAttr. It seems to become not symmetric with struct and array but it can be acceptable as it's for internal use.
I also updated the description.

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Test build #135412 has finished for PR 31601 at commit b306786.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak sarutak changed the title [SPARK-34484][SQL] Introduce a new syntax to represent attributes with the Catalyst DSL [SPARK-34484][SQL] Introduce a new syntax to represent map types with the Catalyst DSL Feb 24, 2021
@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40001/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40001/

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please update the PR title

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40071/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Test build #135490 has finished for PR 31601 at commit a9e3490.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Feb 26, 2021

Newly added RemoveNoopUnionSuite seems to use symbol literal, which causes the build failure.
I'll fix it too.

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40073/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40073/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Test build #135492 has finished for PR 31601 at commit e3dead4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40097/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40097/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Test build #135517 has finished for PR 31601 at commit 0af19a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait PartitionSpec extends LeafExpression with Unevaluable
  • trait V2PartitionCommand extends Command
  • case class TruncateTable(table: LogicalPlan) extends Command
  • case class TruncatePartition(
  • case class TruncatePartitionExec(

@sarutak
Copy link
Member Author

sarutak commented Mar 1, 2021

@cloud-fan According to this, should we wait for what happens as a result of the discussion?

@cloud-fan
Copy link
Contributor

Is there a released Scala version that forbids the symbol's literal syntax?

@sarutak
Copy link
Member Author

sarutak commented Mar 1, 2021

No, but we get warning with Scala 2.13.

EDIT: dotty seems not to support symbol literals.
https://scalacenter.github.io/scala-3-migration-guide/docs/incompatibilities/dropped-features.html

@cloud-fan
Copy link
Contributor

OK seems the ship is already sailed. Let's wait a bit more but I'm not sure we can change that decision...

@sarutak
Copy link
Member Author

sarutak commented Mar 1, 2021

OK seems the ship is already sailed. Let's wait a bit more but I'm not sure we can change that decision...

Yeah, I think so too. At least, symbol literals will be no longer supported as dotty already removed them.
https://dotty.epfl.ch/docs/reference/dropped-features/symlits.html

@sarutak sarutak force-pushed the dsl-attr branch 2 times, most recently from e03dca2 to 506bf5b Compare March 3, 2021 04:29
@SparkQA
Copy link

SparkQA commented Mar 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40273/

@SparkQA
Copy link

SparkQA commented Mar 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40273/

@sarutak
Copy link
Member Author

sarutak commented Mar 18, 2021

@cloud-fan Surprisingly, Scala 3 seems to implement the backward compatibility feature for symbol literals (it may be a life extension measure, however).
scala/scala3#11588

So, I think replacing symbol literals with mapAttr might not be needed.
How about changing map to mapAttr? Do you think it's not needed too?
If so, I'll just close this PR.

@cloud-fan
Copy link
Contributor

cloud-fan commented Mar 18, 2021

If scala will keep that backward compatibility feature forever, then we don't need to do anything. Otherwise, we still need to prepare for the removal of the symbol syntax.

@sarutak
Copy link
Member Author

sarutak commented Mar 18, 2021

f scala will keep that backward compatibility feature forever,

According to this, it seems to be a life extension measure and will be removed in the future.
But I'm asking the Scala community.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 27, 2021
@github-actions github-actions bot closed this Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants